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Abstract We discuss two projects in non-linear cosmostatistics applicable to very 
large surveys of galaxies. The first is a Bayesian reconstruction of galaxy redshifts 
and their number density distribution from approximate, photometric redshift data. 
The second focuses on cosmic voids and uses them to construct cosmic spheres 
that allow reconstructing the expansion history of the Universe using the Alcock- 
Paczynski test. In both cases we find that non-linearities enable the methods or 
enhance the results: non-linear gravitational evolution creates voids and our photo-z 
reconstruction works best in the highest density (and hence most non-linear) por- 
tions of our simulations. 



1 What is cosmostatistics? 

Cosmostatistics is the discipline of using the departures from homogeneity observed 
in astronomical surveys to distinguish between cosmological models. It therefore 
plays a central role in the cosmological agenda for the coming decade, which is to 

• learn about the cosmic beginning; 
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• understand the cosmic constituents, in particular Dark Matter and Dark Energy; 
and 

• understand cosmological evolution from initial seed perturbations to current ob- 
servations 

One of the challenges for cosmostatistics is that any given observable (maps of the 
cosmic microwave background, galaxy survey, etc.) is informative about all these 
goals in some way. 




(simulated) 



Fig. 1 Cosmostatistics uses the stochastic departures from homogeneity on all observable scales 
to distinguish between cosmological models. 

We are fortunate to live in a time when the cosmic microwave background 
(CMB)is being mapped with high precision from space (by the WMAP [7] and 
Planck [9] missions), and ground-based and space-based missions are mapping 
out sizable fractions of the observable Universe in exquisite detail and in three di- 
mensions, across large swaths of the electromagnetic spectrum. Between these two 
approaches we expect the CMB to have much more signal on very large scales, 
whereas in principle, probes of density should win overall, simply since there are 
vastly more modes in a three-dimensional data set which greatly reduces sample 
variance. 
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How do we realize the immense promise of large scale structure surveys for 
constraining cosmological models? A number of known and unknown systematics 
stand between where we are now and the dream of accessing the vast number of 
perturbation modes sampled by tracers of the underlying density field. Many of 
these systematics complicate the relationship between the distribution of tracers and 
the mass distribution we would actually like to probe. 

These complications arise either due to the intricate physics of galaxy formation 
or through incomplete information in the data (e.g. having access only to approxi- 
mate photometric redshift information instead of the much more expensive spectro- 
scopic redshifts). In addition, the mass density has undergone non-linear dynamical 
evolution on length scales less than ~ 20Mpc/h, which has coupled the perturba- 
tion modes in ways that are non-trivial to model. Non-linear mode coupling erases 
information that the mode amplitudes carried about the state of the early Universe 
from whence they arose. On the largest scales the limits are set by causality and 
hence the finite volume of the observable Universe. 

Most people would agree on the impracticality of incorporating fully non-linear 
gravitational evolution into cosmological inference, let alone a fully physical model 
of galaxy formation. So the challenge is to find ways of looking at the data that are 
robust to these systematics. 

When it comes to dealing with incomplete information, the challenge is to pro- 
duce a joint analysis with uncontroversial prior information that allows reconstitut- 
ing some of the information that has not been captured in the data. 

In this talk we will highlight two recent papers which give examples of these 
two approaches. In one case [3], we develop a Bayesian approach to improving 
photometric redshift estimates (and simultaneously estimate the density of the trac- 
ers). The prior information we assume to achieve this information recovery is local 
isotropy of the tracer distribution. 

In the second paper [5] we define a new observable to prove the physical prop- 
erties of dark energy: stacked voids. In this case we choose a very specific pre- 
processing step to extract features of the data which should be robust to galaxy bias 
and to non-linearity. The approach explicitly projects out the details of the tracer dis- 
tribution in the non-linear density field to obtain nearly spherical objects that nearly 
co-move with the expansion which serve as the basis of a powerful and purely geo- 
metrical test of the expansion history of the Universe. Again, local isotropy underlies 
this approach which posits that underdense regions are not preferentially oriented 
with respect to an observer's line of sight. 



2 Bayesian inference from photometric redshift surveys 

The vast majority of ongoing and future surveys (CFHTLS, DES, Pan-STARRS, 
LSST) are or will be photometric. This is a simple consequence of the cost of taking 
a galaxy spectrum with current technology. Photometric redshift errors of Az 0.03, 
the current state-of-the-art, translate into smearing along the line of sight on scales 
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of ~ lOOMpc. Such errors are not detrimental to certain kinds of science but will 
cause any structure smaller than 100 Mpc to be wiped out, as illustrated in Figure 2. 

Looking at the trivial density estimate calculated binning photometric tracers 
shown in Figure 2 it is immediately clear that the line-like finger-of-god artifacts in- 
troduced by photo-z smearing are very recognizable, since they break local isotropy, 
a core element of our cosmology. Since they stand out so visibly, we wondered if 
they could be removed. 



ln(2 + 8) galaxy number counts galaxy number counts 

+ 0.00 +0.50 +1.00 +1.50 +2.00 +2.50 +3.00 +0.00 +1.67 +3.33 +5.00 +6.67 +8.33 +10.00 +0.00 +1.67 +3.33 +5.00 +6.67 +8.33 +10.00 




200 400 600 800 1000 1200 200 400 600 800 1000 1200 200 400 600 800 1000 1200 

x [Mpc] x [Mpc] x [Mpc] 



Fig. 2 From an n-body simulation to the simulated photo-z survey: the particle density in the sim- 
ulation (left), after application of the mask (center), and after simulation of photo-z uncertainties 
(right). 



In the following we will often refer to the tracers as galaxies, but the nature of 
the tracer is of no importance to the functioning or implementation of the algorithm. 



2.1 A simple model of a photo-z catalogue 

First we build a hierarchical model for the distribution of tracers. A simple approach 
is to consider the points an inhomogeneous Poisson process. The intensity function 
of the Poisson process is the underlying number density field, which in turn is a 
correlated, statistically isotropic, log-normal random field. For the purposes of this 
exercise we will assume that the correlation function (or equivalently the power 
spectrum P(k)) is known. Relaxing this assumption will be subject of a future study. 

The third level in the model hierarchy: photo-z distortions modify the galaxy 
positions along the radial lines of sight. It is assumed that the redshift uncertainties 
are specified in terms of a pdf for each tracer. These photo-z pdfs are assumed to be 
the output of an earlier analysis step which uses any information available, except 
the spatial distribution of the tracers in the catalog. All photometric information for 
the galaxy including any morphological features that can be discerned in the images 
are fair game. 
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2.1.1 Implementation 

This hierarchical model can be straightforwardly implemented. The challenge is 
to explore the posterior density in an efficient manner since the parameter space 
is enormous: approximately 16 million parameters for the number density and 20 
million galaxy redshifts. We choose a block Gibbs-Metropolis-Hastings sampling 
approach with the following steps: 

Sample the number density given the current galaxy redshifts. We draw from the 
conditional posterior of the number density assuming that the current "guess" of 
the galaxy redshift is correct. This is a solved problem[4]; it uses a Hamiltonian 
sampling approach to update the number density field using the galaxy positions 
and incorporating the correlated log-normal prior. 

Sample the galaxy redshifts given the number density. The redshift posteriors for 
the galaxies are conditionally independent given the number density field. This 
feature allows parallelizing this step over the number of galaxies. Each galaxy 
performs one step of a Metropolis-Hastings Markov Chain Monte Carlo along 
the line of sight. The conditional posterior for each galaxy is the product of the 
input photo-z pdf for this galaxy and the number density. 

Conditional independence is the key feature that allows this algorithm to scale to 
tens of millions of galaxies. From the perspective of the message passing paradigm 
of Bayesian inference, the number density field communicates the information about 
all the other galaxies to each individual one. 



2.2 Results 

Figures 3 and 4 illustrate our approach. Even within a few steps the samples of the 
number density isotropize. As the sampler progresses, individual galaxies explore 
along their line of sight in a number density field which in turn fluctuates in response 
to the changing galaxy positions. 

Figures 3 and 4 illustrate our approach. The first figure shows that even within a 
few steps the samples of the number density become isotropized. In the second fig- 
ure we track the redshift of an example galaxy as the sampler explores the range of 
possible reconstructions. The galaxies explore along their line of sight in a number 
density field that, in turn, fluctuates in response to the changing galaxy positions. 

The results are encouraging. In high density regions galaxy redshift uncertainties 
reduce by a factor of several. When a galaxy could reside in one of several con- 
centrations lying along the line of sight the output pdf is multi-modal. Even so, the 
reconstructed redshift posteriors of the galaxies are generally far more informative 
than the inputs coming from photometric redshift estimators. 

In order to summarize the result of the reconstruction we form the posterior mean 
estimator, the average of the number density field realizations that are explored by 
the sampler. We can compare this reconstruction to assess its capability to reproduce 





Fig. 4 Constrained realizations of the reconstructed density field. The data was simulated using 
an n-body simulation and the reconstruction assumes the Poisson-lognormal prior with isotropic 
correlations. 



features of the input map. Figure 5 shows the k-space cross-correlations between the 
reconstructed and the input field. It is clear that the method is very successful in the 
high density parts of the sky. 



2.3 Discussion and conclusions 

The first main point of this talk is that we demonstrated the technical achievement 
of running a fully Bayesian analysis of a simulated data set with tens of millions of 
galaxies, and density fields represented on tens of millions of grid zones. The scale 
of this application corresponds to that of the current generation of available surveys, 
so it should be feasible to apply this approach to existing data. 

The second key issue is to test whether our analysis is sensitive to model mis- 
specification, since the real data will not follow the correlated log-normal Poisson 




k th [h/Mpc ] 



Fig. 5 The reconstructed density recovers the small scale features of the input density very well in 
high density regions. The figure shows the cross-correlation between the input field and the recon- 
structed density as a function of wave number. Different lines correspond to different thresholds of 
overdensity. 



model. Our initial tests (of code correctness) used simulations that were consistent 
with the prior assumptions. These tests were passed. We do not show these tests here 
because the prior produces density fields that clearly not realistic, missing much of 
the filamentary structure which is characteristic of the cosmic web. 

The work we present in this talk (and described in detail in Jasche & Wandelt) 
uses simulated from an n-body simulation. Our results demonstrate that the recon- 
struction is successful in spite of using an approximate model. 

The key feature underlying the reconstruction is clearly the ability to build in the 
prior assumption of isotropic correlations in the underlying cosmological number 
density field of the tracers. A secondary feature is the assumption of the shape of 
the correlations. What we show is that modeling those two aspects of the data results 
in acceptable reconstructions, that improve the redshift information for each galaxy 
significantly. It is also true that a better model including the morphological features 
of realistic gravitationally evolved number density would likely improve upon our 
results, since the differences between a correlated Poisson log-normal sample and a 
physical sample drawn from an n-body simulation are easily visible by eye. But it 
is clear that the reconstructions are not highly sensitive to the details of the assumed 
prior as long as two salient features of correlation and isotropy are included for 
the density field and we posit a simple statistical relationship of the tracers to the 
underlying density, in this case the inhomogeneous Poisson model. 

Our approach is completely independent of and complementary to the means by 
which the photometric redshift is derived. The method is ready for tests on realistic 
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data where the photoz pdfs will be specified in terms of a different pdf for each 
galaxy. 

As a consequence the method will be able to benefit from those tracers whose 
redshifts are better determined that others. In particular we can merge the advan- 
tages of a large number of galaxies in photometric samples and the accuracy of 
spectroscopic samples! We will explore this idea further in follow-up studies. 

This inference problem is of particular interest because it is an example where 
combining millions of noisy measurements with a physical prior, namely the as- 
sumption of isotropic correlations produces a decisive gain in information. 

In the second part of the talk we will see another application of the notion of 
statistical isotropy - this time to the construction of an estimator for the expansion 
history of the Universe. 



3 Precision Cosmography with Cosmic Voids 

Understanding the physical properties of dark energy is a major goal of modern 
cosmology. There are essentially two distinct approaches to reaching this goal: cos- 
mography and tracing structure formation. 

Cosmography. The cosmography approach, which constrains dark energy prop- 
erties using precision measurements of the expansion geometry of the Universe. 
Einstein's equation relates the geometrical properties of our Universe to its content. 
Since "dark energy" is just a placeholder for the terms in Einstein's equation that 
drive the observed accelerated expansion of the Universe, precision cosmographical 
measurements can tell us about the time dependence of these terms and hence about 
the value, and rate of change of the equation of state parameter. 

Tracing structure formation. The expansion of the universe has an impact on the 
rate at which primordial perturbations amplify. These perturbations then form struc- 
tures through non-linear gravitational evolution, galaxy formation etc.. Observing 
the statistical properties (number, size etc) of these structures as a function of red- 
shift constrains the growth of structure, and hence the expansion history, which is 
informative about the properties of dark energy. 

It is clear from this description that geometrical approaches are more direct. In 
addition, approaches relying on the statistical measures of the amount of structure 
in the universe inevitably require a detailed understanding of the processes that re- 
late the formed structures to the underlying perturbation amplitude. These processes 
(e.g. galaxy formation) can be highly complex and deeply non-linear and are re- 
search areas in themselves. 

Geometrical approaches function by constructing standards out of observables 
(or combinations of observables) that can be modeled reliably such as standard can- 
dles (as in the case of type la supernovae), standard rulers (as in the case of Baryon 
Acoustic Oscillations (BAO)) or time standards (such as the (differences of) ages of 
galaxies). 
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3.1 The stacked voids Alcock-Paczynski test 

The Alcock-Paczynski (AP) test [2] requires a different standard: "standard, co- 
expanding spheres." One way to construct such standard spheres is through appeal- 
ing to the statistical isotropy of the cosmological perturbations. In that case, correla- 
tions should depend only on the length, but not the direction of the vector connecting 
the two points being correlated. If the tracers that are being correlated did not move, 
any anisotropy in the correlation function could be interpreted as being due to the 
cosmological expansion at the redshift of the correlated objects. 

The key difficulty in constructing standard spheres are peculiar velocity effects. 
Any tracers that happen to lie in gravitationally bound structure will have velocities 
of the order of the depth of the gravitational potential well of the structure. For 
clusters or groups of galaxies the resulting finger of god effect in redshift space 
dominates the cosmic expansion signal by an order of magnitude. To construct an 
Alcock Paczyski test would therefore require a separate high precision measurement 
of the depth and shape of the potential well of any structures whose parts were used 
in the construction of the test. 

So far the main work-around has been to only use very long range correlations of 
order 100 /z -1 Mpc where peculiar velocity effects become sub-dominant compared 
to cosmic expansion effect and where the baryon sound speed at radiation drag 
leads to a peak in the correlation function. The downside of this limiting oneself 
to such large scales is that the statistical constraints will depend on the number 
of independent correlation volumes in survey volume, which limits the number of 
perturbation modes that can be used to arrive at the dark energy constraints and 
therefore leads one to consider extremely large surveys. 

In this talk we propose a new way of constructing standard spheres: stacking 
cosmic voids. While the AP test had been discussed for especially spherical individ- 
ual voids [10] stacking many voids guarantees spherical symmetry since isotropy 
prevents cosmic voids from pointing at us (or away from us) preferentially. Find- 
ing voids in redshift shells, extracting them from the survey, co-centering them and 
stacking them, therefore gives rise to spherically symmetric underdensities. 

There are several advantages to using cosmic voids: 

• Voids are simple: peculiar velocities in and around voids are small compared 
to the cosmic expansion. We find that they give a 16% systematic effect on our 
reconstructed Hubble diagram, with a very mild dependence on void size and 
redshift. 

• Voids are small: A typical void size is 10 h~ l Mpc - for a dense enough survey 
the number of voids per unit volume that can be detected is therefore of order 
1000 times larger than the number of BAO correlation volumes. 

• Voids remember: we find that voids have a well-ordered phase space - all they do 
is empty themselves out. 

We use the term cosmic voids not to describe regions that are entirely empty, but 
regions that are underdense basins of repulsion in the cosmic density field. 



10 



Wandelt et al. 



In order to demonstrate the promise of stacked voids for constructing a powerful 
AP test we solved the following problems: 

1 . create a suitable void definition: a modified ZOBOV algorithm [8] (see Figure 6); 

2. define a method to add voids into stacks labeled by size and redshift, which both 
enhances signal to noise and sphericalizes them (see Figure 7); 

3. determine the number of voids that would be available to this method in an ob- 
served cosmological volume (see Figure 8); and 

4. measure their stretch along the line of sight in order to obtain the expansion 
history of the universe (see Figures 9 and Figure 10). 

Details can be found in our main paper [5]. 




Fig. 6 The results of our void finder in a slice of an n-body simulation. The void finder constructs 
a hierarchical structure of voids. Each patch is a void, colored according to the level in the void 
hierarchy. When collecting voids in a size bin during the stacking procedure the algorithm traverses 
the tree in a depth first algorithm and marks and returns the first void it finds which satisfies the 
size criterion. 



We tested these methods in a series of three pure dark matter Af-body simulations 
with different realizations of the initial conditions but the same cosmology. The vol- 
ume of each simulation is given by a cube of side L = 500/z -1 Mpc. Each simulation 
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Fig. 7 A void stack for 8 h~ x Mpc voids. Left: the stack after fitting removing the cosmic expansion 
effect, but without including peculiar velocities in the simulation. We find our profile agrees well 
with that found in [13]. Right: The stack when peculiar velocities are included. The same cosmic 
expansion has been removed as in the left panel. Careful inspection shows that peculiar velocities 
lead to a small net compression of the void stack along the line of sight. 
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Fig. 8 Our simulation results for numbers densities of cosmic voids as a function of redshift for 
voids of different sizes. These simulation results agree with the model described in [11]. 
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Simulation R=6 h 1 Mpc 
Theory 



Simulation R=8 h 1 Mpc 
Theory 




Fig. 9 The measured void stretch expansion history as a function of redshift for voids of 6, 8, 
and 14 /z _1 Mpc (from left to right and top to bottom) for thhee simulations. The long-dashed line 
shows the result for the simulated cosmology. No peculiar velocities were included in the mock 
catalogs used for these plots. The lower right panel shows the result for 8 h~ x Mpc voids for mocks 
with peculiar velocities and without any correction for peculiar velocity effect. The lack of redshift 
dependence of the resulting bias is clear. The same plot after debiasing is is shown in Figure 10. 



had N = 512 3 particles. We adopted a ACDM-WMAP7 cosmology with the follow- 
ing parameters: Q b h 2 = 0.02258, Q c h 2 = 0.1108, H = 71 km s" 1 Mpc -1 , w = -1, 
n s = 1, A s = 2.34 x 10~ 9 . This corresponds to Q b = 0.045, ^ M = 0.264, cr 8 = 0.84. 
Each particle had a mass m p = 2.05 I0 n h~ l M . The transfer function for density 
fluctuations for this cosmology was computed using CAMB [6]. The initial con- 
ditions were generated using ICGEN, 1 a code which uses the transfer function to 
generate a density field from the primordial power spectrum. 



1 Available from http://www.iap.fr/users/lavaux/. 
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Fig. 10 Void stretch expansion history inferred from 8 h 1 Mpc voids after the correction of a 
peculiar velocity bias. There is no evidence for residual bias at the level of our simulations. 



3.2 Discussion and Conclusion 

Based on these results we performed a Fisher matrix forecast of the statistical con- 
straints on dark energy equation of state parameter w a and its rate of change w p that 
we would expect from Euclid. We quantify the answer in terms of the figure of merit 
defined by the Dark Energy Task Force [1], ie the relative reduction in the area of 
the uncertainty ellipse for these two quantities. The result is exciting-we find that 
the stacked void Alcock-Paczynski test has the potential significantly to enhance 
the power of the proposed (and now selected) Euclid space craft to constrain dark 
energy phenomenology. 

On the face of it cosmic voids have the potential to provide a far more powerful 
constraint on dark energy than measurements of the Baryonic Acoustic Oscillation 
scale, by up to an order of magnitude. This large increase of information is eas- 
ily understood in comparing the number of modes probed by voids compared to 
BAOs, which scales roughly as the third power of the ratio of the BAO scale to the 
scale of the smallest usable voids ~ 1000. The area of parameter constraints scales 
as the square root of the number of modes ~ 30. When projected into the w ai w p 
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plane using the Fisher matrix formalism for the EUCLID wide survey, we find the 
improvement over BAO on those parameters by a factor of ~ 10. 

We expect our stacked void shape measurements to be robust to galaxy bias as 
it is purely geometrical and relies on the topology of the density field [12]. In fact, 
it is possible that biased tracers of the density enhance the contrast of voids and 
therefore enhance the void detection rate. These expectations remains to verified on 
more realistic mock catalogs and real data. 

Based on our Fisher matrix forecasts, the stacked voids technique promises a 
remarkable increase to the figure of merit from EUCLID when compared to the 
combined results from all other probes using EUCLID data (BAO, weak lensing, 
type la supernovae, cluster counts). The Alcock-Paczinsky test using stacked voids 
is therefore potentially a significant addition to the portfolio of major dark energy 
probes which merits further detailed studies focused on additional real-world sys- 
tematics and optimal survey design. 
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